31 research outputs found
Efficient, sparse representation of manifold distance matrices for classical scaling
Geodesic distance matrices can reveal shape properties that are largely
invariant to non-rigid deformations, and thus are often used to analyze and
represent 3-D shapes. However, these matrices grow quadratically with the
number of points. Thus for large point sets it is common to use a low-rank
approximation to the distance matrix, which fits in memory and can be
efficiently analyzed using methods such as multidimensional scaling (MDS). In
this paper we present a novel sparse method for efficiently representing
geodesic distance matrices using biharmonic interpolation. This method exploits
knowledge of the data manifold to learn a sparse interpolation operator that
approximates distances using a subset of points. We show that our method is 2x
faster and uses 20x less memory than current leading methods for solving MDS on
large point sets, with similar quality. This enables analyses of large point
sets that were previously infeasible.Comment: Conference CVPR 201
Selecting Informative Contexts Improves Language Model Finetuning
We present a general finetuning meta-method that we call information gain
filtration for improving the overall training efficiency and final performance
of language model finetuning. This method uses a secondary learner which
attempts to quantify the benefit of finetuning the language model on each given
example. During the finetuning process, we use this learner to decide whether
or not each given example should be trained on or skipped. We show that it
suffices for this learner to be simple and that the finetuning process itself
is dominated by the relatively trivial relearning of a new unigram frequency
distribution over the modelled language domain, a process which the learner
aids. Our method trains to convergence using 40% fewer batches than normal
finetuning, and achieves a median perplexity of 54.0 on a books dataset
compared to a median perplexity of 57.3 for standard finetuning using the same
neural architecture
On MMSE and MAP Denoising Under Sparse Representation Modeling Over a Unitary Dictionary
Among the many ways to model signals, a recent approach that draws
considerable attention is sparse representation modeling. In this model, the
signal is assumed to be generated as a random linear combination of a few atoms
from a pre-specified dictionary. In this work we analyze two Bayesian denoising
algorithms -- the Maximum-Aposteriori Probability (MAP) and the
Minimum-Mean-Squared-Error (MMSE) estimators, under the assumption that the
dictionary is unitary. It is well known that both these estimators lead to a
scalar shrinkage on the transformed coefficients, albeit with a different
response curve. In this work we start by deriving closed-form expressions for
these shrinkage curves and then analyze their performance. Upper bounds on the
MAP and the MMSE estimation errors are derived. We tie these to the error
obtained by a so-called oracle estimator, where the support is given,
establishing a worst-case gain-factor between the MAP/MMSE estimation errors
and the oracle's performance. These denoising algorithms are demonstrated on
synthetic signals and on true data (images).Comment: 29 pages, 10 figure
Humans and language models diverge when predicting repeating text
Language models that are trained on the next-word prediction task have been
shown to accurately model human behavior in word prediction and reading speed.
In contrast with these findings, we present a scenario in which the performance
of humans and LMs diverges. We collected a dataset of human next-word
predictions for five stimuli that are formed by repeating spans of text. Human
and GPT-2 LM predictions are strongly aligned in the first presentation of a
text span, but their performance quickly diverges when memory (or in-context
learning) begins to play a role. We traced the cause of this divergence to
specific attention heads in a middle layer. Adding a power-law recency bias to
these attention heads yielded a model that performs much more similarly to
humans. We hope that this scenario will spur future work in bringing LMs closer
to human behavior.Comment: To appear in the 26th Conference on Computational Natural Language
Learning (CoNLL 2023). Code and data are available at
https://github.com/HuthLab/lm-repeating-tex
Synthesizing Programs with Continuous Optimization
Automatic software generation based on some specification is known as program
synthesis. Most existing approaches formulate program synthesis as a search
problem with discrete parameters. In this paper, we present a novel formulation
of program synthesis as a continuous optimization problem and use a
state-of-the-art evolutionary approach, known as Covariance Matrix Adaptation
Evolution Strategy to solve the problem. We then propose a mapping scheme to
convert the continuous formulation into actual programs. We compare our system,
called GENESYS, with several recent program synthesis techniques (in both
discrete and continuous domains) and show that GENESYS synthesizes more
programs within a fixed time budget than those existing schemes. For example,
for programs of length 10, GENESYS synthesizes 28% more programs than those
existing schemes within the same time budget
A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions
The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We present AutoPerf–a novel approach to automate regression testing that utilizes three core techniques:(i) zero-positive learning,(ii) autoencoders, and (iii) hardware telemetry. We demonstrate AutoPerf’s generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs. On average, AutoPerf exhibits 4% profiling overhead and accurately diagnoses more performance bugs than prior state-of-the-art approaches. Thus far, AutoPerf has produced no false negatives
Clutter Mitigation in Echocardiography Using Sparse Signal Separation
In ultrasound imaging, clutter artifacts degrade images and may cause inaccurate
diagnosis. In this paper, we apply a method called Morphological Component Analysis (MCA) for sparse signal separation with the objective of reducing such clutter artifacts. The MCA approach assumes that the two signals in the additive mix have each a
sparse representation under some dictionary of atoms (a matrix), and separation is achieved by finding these sparse representations. In our work, an adaptive approach is used for learning the dictionary from the echo data. MCA is compared to Singular Value Filtering (SVF), a Principal Component Analysis- (PCA-) based filtering technique, and to a high-pass Finite Impulse Response (FIR) filter. Each filter is applied to a simulated hypoechoic lesion sequence, as well as experimental cardiac ultrasound data. MCA is demonstrated in both cases to outperform the FIR filter and obtain results comparable to the SVF method in terms of contrast-to-noise ratio (CNR). Furthermore, MCA shows a lower impact on tissue sections while removing the clutter artifacts. In
experimental heart data, MCA obtains in our experiments clutter mitigation with an average CNR improvement of 1.33 dB
Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets
The scale of functional magnetic resonance image data is rapidly increasing
as large multi-subject datasets are becoming widely available and
high-resolution scanners are adopted. The inherent low-dimensionality of the
information in this data has led neuroscientists to consider factor analysis
methods to extract and analyze the underlying brain activity. In this work, we
consider two recent multi-subject factor analysis methods: the Shared Response
Model and Hierarchical Topographic Factor Analysis. We perform analytical,
algorithmic, and code optimization to enable multi-node parallel
implementations to scale. Single-node improvements result in 99x and 1812x
speedups on these two methods, and enables the processing of larger datasets.
Our distributed implementations show strong scaling of 3.3x and 5.5x
respectively with 20 nodes on real datasets. We also demonstrate weak scaling
on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768
cores